Goto

Collaborating Authors

 learning sequential structure


Learning Sequential Structure in Simple Recurrent Networks

Neural Information Processing Systems

We explore a network architecture introduced by Elman (1988) for predicting successive elements of a sequence. The network uses the pattern of activation over a set of hidden units from time-step t-l, together with element t, to predict element t 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. Cluster analyses of the hidden-layer patterns of activation showed that they encode prediction-relevant information about the entire path traversed through the network. We illustrate the phases of learning with cluster analyses performed at different points during training.


Learning Sequential Structure in Simple Recurrent Networks

Neural Information Processing Systems

This tendency to preserve information about the path is not a characteristic of traditional finite-state automata. ENCODING PATH INFORMATION In a different set of experiments, we asked whether the SRN could learn to use the infonnation about the path that is encoded in the hidden units' patterns of activation. In one of these experiments, we tested whether the network could master length constraints. When strings generated from the small finite-state grammar may only have a maximum of 8 letters, the prediction following the presentation of the same letter in position number six or seven may be different. For example, following the sequence'TSSSXXV', 'V' is the seventh letter and only another'V' would be a legal successor.


Learning Sequential Structure in Simple Recurrent Networks

Neural Information Processing Systems

This tendency to preserve information about the path is not a characteristic of traditional finite-state automata. ENCODING PATH INFORMATION In a different set of experiments, we asked whether the SRN could learn to use the infonnation about the path that is encoded in the hidden units' patterns of activation. In one of these experiments, we tested whether the network could master length constraints. When strings generated from the small finite-state grammar may only have a maximum of 8 letters, the prediction following the presentation of the same letter in position number six or seven may be different. For example, following the sequence'TSSSXXV', 'V' is the seventh letter and only another'V' would be a legal successor.


Learning Sequential Structure in Simple Recurrent Networks

Neural Information Processing Systems

The network uses the pattern of activation over a set of hidden units from time-step tl, together with element t, to predict element t 1. When the network is trained with strings from a particular finite-state grammar, it can learn to be a perfect finite-state recognizer for the grammar. Cluster analyses of the hidden-layer patterns of activation showed that they encode prediction-relevant information about the entire path traversed through the network. We illustrate the phases of learning with cluster analyses performed at different points during training. Several connectionist architectures that are explicitly constrained to capture sequential infonnation have been developed. Examples are Time Delay Networks (e.g.